In this study I look at data from the 2016 Presidential election by candidate, total contributions, location of, and number of contributors. For each of these variables I try to parse the data and create some visualizations that sum up the data well. I end up creating a statististical summary with bar and line plots, overview of the data by plotting, on a world map, some of the aforementioned varibles, and a general box plot to differentiate between gender and party in the election.
setwd("C:/USers/Zohaib/Desktop/Lectures/Udacity/R") #Setting the correct working directory.
Contributions<-read.csv("ContributionsEdited.csv",header = TRUE,na.strings="",encoding="UTF-8") #Reading in the data.
library(plyr) #Setting up the libraries for the following code.
library(ggplot2)
library(gridExtra)
library(maps)
library(ggmap)
library(devtools)
library(dplyr)
for (i in c("cand_nm","election_tp")){print(sort(table(Contributions[i]),decreasing=TRUE))} # Gives table counts for
##
## Clinton, Hillary Rodham Sanders, Bernard
## 3506081 2063097
## Trump, Donald J. Cruz, Rafael Edward 'Ted'
## 782711 557581
## Carson, Benjamin S. Rubio, Marco
## 248227 104814
## Paul, Rand Bush, Jeb
## 32485 29459
## Fiorina, Carly Kasich, John R.
## 27793 25498
## Johnson, Gary Stein, Jill
## 13429 11020
## Walker, Scott Huckabee, Mike
## 7338 6460
## Christie, Christopher J. O'Malley, Martin Joseph
## 5998 5317
## Graham, Lindsey O. McMullin, Evan
## 4397 2540
## Santorum, Richard J. Lessig, Lawrence
## 1723 1339
## Perry, James R. (Rick) Webb, James Henry Jr.
## 908 800
## Jindal, Bobby Pataki, George E.
## 765 344
## Gilmore, James S III
## 88
##
## P2016 G2016 O2016 P2020 P2012 P2018 G2015 P2015 G2106
## 4789297 2634940 1891 76 3 3 2 2 1
## P2019
## 1
# each candidate and
# election.
Contributions$names<-lapply(as.character(Contributions$cand_nm),function(x) strsplit(x,",")[[1]][1])
Contributions$names<-factor(Contributions$names,levels=sort(as.vector(as.character(unique((Contributions$names)))))) #Creates and easier to use name variable, with just last names.
As expected, the most number of the contributions are going to the popular choice, Clinton, the fundraising upset Sanders, and election winner Trump. For the rest of the individuals, one can get a sense of how the election turned out just by looking at the differing number of contributions.
plotCont<-subset(Contributions,election_tp=="P2016"|election_tp=="G2016") #Creates data set to focus on 2016 election.
p1<-ggplot(aes(x=names,y=contb_receipt_amt,group=1),
data=plotCont)+
geom_bar(aes(fill='red'),stat="summary",fun.y=mean)+ #bar plot for mean of contbributions by candidate
facet_wrap(~election_tp,nrow = 2)+ #split into the primary and general elections.
geom_point(alpha=.5,size=.75,stat="summary",fun.y=median)+
geom_line(color='purple',stat="summary",fun.y=median)+ #line plot for median contributions.
labs(x="Candidate Names",y="Mean Contribution Amount ($)")+
theme(plot.title = element_text(size=22))+
guides(fill=FALSE)
p2<-ggplot(aes(x=names,y=contb_receipt_amt/1000000,group=1),
data=plotCont)+
ylab("Contribution Total ($ mil)")+ #line plot for sum of contributions by candidate split by primary and
ggtitle("Contribution by Candidate")+ #general elections.
geom_line(size=1.25,color='steelblue',stat="summary",fun.y=sum)+
theme(axis.title.x=element_blank())+
facet_wrap(~election_tp,nrow=2)+
guides(fill=FALSE)
grid.arrange(p2,p1)
From this one can notice something odd about the mean contributions as some of the individuals you would expect to have high mean and/or median contributions do not! Clinton, Sanders, and Trump all have low amounts in comparison.The three, on the other hand, do obviously have the highest total contributions, but one should find it very odd that some of the prominent candidates had low statistics, while someone like Jindal had or Lessig both had very high amounts. Then there are the negative means (most likely due to reimbursements), and contributions in the general election for people that were not even in the race such as Lessig who has a high mean amount of contributions then. Yet, when one looks at the count for Lessign he only has 1339 contributions so these numbers are a little less concerning.
ggplot(aes(x=contbr_occupation,y=contb_receipt_amt),
data=plotCont)+
geom_point(stat="summary",fun.y=mean)
ggplot(aes(x=contbr_city,y=contb_receipt_amt), #Two plots trying to create scatter plots by city and occupation versus
data=plotCont)+ #receipt amounts, but this obviously does not seem like the best plots.
geom_point(stat="summary",fun.y=mean)
These plots do not really say anything, and it is near impossible to do anything worth-while with them.
Contributions$Gender<-NA
Contributions$Party<-NA
Males=c("Rubio","Santorum","Perry","Carson","Cruz","Paul","Sanders","Huckabee",
"Pataki","O'Malley","Graham","Bush","Trump","Jindal","Christie",
"Walker","Webb","Kasich","Gilmore","Lessig","Johnson","McMullin")
Females=c("Clinton","Fiorina","Stein")
Republicans=c("Rubio","Santorum","Perry","Carson","Cruz","Fiorina","Paul","Huckabee",
"Pataki","O'Malley","Graham","Bush","Trump","Jindal","Christie",
"Walker","Kasich","Gilmore")
Democrats=c("Clinton","Sanders","Webb","Lessig")
Others=c("Stein","Johnson","McMullin") #Creating Gender and Party variables based off of a list of the names that
#fall in each category.
for (i in Contributions$names){
if (i %in% Males) {
Contributions$Gender="M"
}
else if (i %in% Females) {
Contributions$Gender="F"
}
else {
Contributions$Gender=NA}}
for (i in Contributions$names){
if (i %in% Republicans){
Contributions$Party="R"
}
else if (i %in% Democrats) {
Contributions$Party="D"
}
else if (i %in% Others){
Contributions$Party="O"
}
else {
Contributions$Party=NA
}}
for (i in unique(Contributions$gender)){
print(c(i,quantile(x=Contributions[Contributions$gender==i,]$contb_receipt_amt,probs = .85)))
} #calculates 85th percentile for contributions by gender.
## 85%
## "M" "100"
## 85%
## "F" "100"
for (i in unique(Contributions$party)){
print(c(i,quantile(Contributions[Contributions$party==i,]$contb_receipt_amt,probs = .85)))
} #calculates 85th percentile for contributions by party.
## 85%
## "R" "250"
## 85%
## "D" "100"
## 85%
## "O" "500"
This data is just to get a sense of the data for the following plots, which need scaling manipulations.
ggplot(aes(x=gender,y=contb_receipt_amt),data=plotCont)+
geom_boxplot(aes(alpha=.1))+
coord_cartesian(ylim=c(quantile(Contributions[Contributions$gender=="F",]$contb_receipt_amt,probs = .25),
quantile(Contributions[Contributions$gender=="F",]$contb_receipt_amt,probs = .95)))+
guides(fill=FALSE)
ggplot(aes(x=party,y=contb_receipt_amt),data=plotCont)+
geom_boxplot(aes(alpha=.1))+
coord_cartesian(ylim=c(quantile(Contributions[Contributions$party=="R",]$contb_receipt_amt,probs=.25),
quantile(Contributions[Contributions$party=="R",]$contb_receipt_amt,probs=.95)))+
guides(fill=FALSE) #plotting boxplots for both contributions by gender and party with the 25th and 95th percentiles
#used to give a better picture of the plots.
Here differences between gender and party are shown in terms of contributions, although, oddly the “Other” party types have a higher median amount of contributions when compared to Republicans and Democrats, although both of the latter have very high outliers throughout the plots. This is probably due to the lower count of contributions that was donated to the Other parties.
namelist<-unique(Contributions$names)
for (i in namelist){
assign(i,subset(plotCont,names==i)%>% #Assigns each candidate name to a dataframe of cities and counts.
group_by(names,contbr_city,contbr_st) %>% #Grouping by city and state to get counts for each for each candidate.
summarize(n=n()) %>%
ungroup() %>%
ungroup())
i<-arrange(get(i),desc(n))
print(head(i,n = 10)) #printing tables of the top ten cities in terms of count of contributions from there.
}
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Rubio MIAMI FL 2258
## 2 Rubio NEW YORK NY 1538
## 3 Rubio HOUSTON TX 1349
## 4 Rubio NAPLES FL 1258
## 5 Rubio DALLAS TX 1214
## 6 Rubio LOS ANGELES CA 874
## 7 Rubio WASHINGTON DC 858
## 8 Rubio ATLANTA GA 679
## 9 Rubio CHICAGO IL 645
## 10 Rubio BOCA RATON FL 608
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Santorum DALLAS TX 59
## 2 Santorum PITTSBURGH PA 35
## 3 Santorum OVERLAND PARK KS 33
## 4 Santorum MCKINNEY TX 31
## 5 Santorum RICHFIELD OH 29
## 6 Santorum LAFAYETTE LA 25
## 7 Santorum GREAT FALLS VA 21
## 8 Santorum PALO ALTO CA 20
## 9 Santorum DARIEN CT 19
## 10 Santorum PANAMA CITY BEACH FL 19
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Perry HOUSTON TX 69
## 2 Perry DALLAS TX 64
## 3 Perry AUSTIN TX 58
## 4 Perry LUBBOCK TX 38
## 5 Perry SAN ANTONIO TX 36
## 6 Perry FORT WORTH TX 23
## 7 Perry MIDLAND TX 18
## 8 Perry BAKERSFIELD CA 17
## 9 Perry FORT LAUDERDALE FL 12
## 10 Perry MISSION TX 12
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Carson HOUSTON TX 1486
## 2 Carson SAN ANTONIO TX 1240
## 3 Carson DALLAS TX 1130
## 4 Carson COLORADO SPRINGS CO 1047
## 5 Carson PHOENIX AZ 1021
## 6 Carson TUCSON AZ 885
## 7 Carson LAS VEGAS NV 866
## 8 Carson SAN DIEGO CA 801
## 9 Carson CHARLOTTE NC 775
## 10 Carson CINCINNATI OH 686
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Cruz HOUSTON TX 14933
## 2 Cruz SAN ANTONIO TX 5089
## 3 Cruz DALLAS TX 5014
## 4 Cruz AUSTIN TX 4044
## 5 Cruz FORT WORTH TX 3660
## 6 Cruz SPRING TX 3323
## 7 Cruz SAN DIEGO CA 1981
## 8 Cruz KATY TX 1967
## 9 Cruz LAS VEGAS NV 1897
## 10 Cruz ARLINGTON TX 1895
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Paul HOUSTON TX 365
## 2 Paul AUSTIN TX 312
## 3 Paul SAN JOSE CA 294
## 4 Paul LAS VEGAS NV 196
## 5 Paul SEATTLE WA 187
## 6 Paul DALLAS TX 181
## 7 Paul SAN ANTONIO TX 179
## 8 Paul NEW YORK NY 175
## 9 Paul LOUISVILLE KY 165
## 10 Paul BOWLING GREEN KY 149
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Clinton NEW YORK NY 155804
## 2 Clinton WASHINGTON DC 71610
## 3 Clinton LOS ANGELES CA 65166
## 4 Clinton SAN FRANCISCO CA 56833
## 5 Clinton BROOKLYN NY 56711
## 6 Clinton CHICAGO IL 46440
## 7 Clinton SEATTLE WA 44346
## 8 Clinton HOUSTON TX 33076
## 9 Clinton AUSTIN TX 30261
## 10 Clinton PORTLAND OR 27821
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Sanders NEW YORK NY 40318
## 2 Sanders SEATTLE WA 35006
## 3 Sanders LOS ANGELES CA 31840
## 4 Sanders SAN FRANCISCO CA 31077
## 5 Sanders PORTLAND OR 29074
## 6 Sanders BROOKLYN NY 26807
## 7 Sanders CHICAGO IL 24410
## 8 Sanders AUSTIN TX 16387
## 9 Sanders SAN DIEGO CA 14864
## 10 Sanders WEST SOMERVILLE MA 13562
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Fiorina HOUSTON TX 361
## 2 Fiorina DALLAS TX 291
## 3 Fiorina NEW YORK NY 279
## 4 Fiorina SAN DIEGO CA 219
## 5 Fiorina ATLANTA GA 196
## 6 Fiorina AUSTIN TX 196
## 7 Fiorina SCOTTSDALE AZ 171
## 8 Fiorina ALEXANDRIA VA 161
## 9 Fiorina LOS ANGELES CA 149
## 10 Fiorina MANDEVILLE LA 146
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Huckabee LITTLE ROCK AR 114
## 2 Huckabee TEXARKANA AR 75
## 3 Huckabee AMARILLO TX 60
## 4 Huckabee OREGON CITY OR 56
## 5 Huckabee TEXARKANA TX 55
## 6 Huckabee HOUSTON TX 51
## 7 Huckabee CONWAY AR 48
## 8 Huckabee HYDE PARK MA 45
## 9 Huckabee DESTIN FL 44
## 10 Huckabee DALLAS TX 42
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Pataki NEW YORK NY 58
## 2 Pataki LAS VEGAS NV 13
## 3 Pataki BRONX NY 11
## 4 Pataki SAN FRANCISCO CA 7
## 5 Pataki SAN JUAN PR 6
## 6 Pataki ALEXANDRIA VA 5
## 7 Pataki BROOKLYN NY 5
## 8 Pataki CARMEL NY 5
## 9 Pataki NAPLES FL 5
## 10 Pataki NEW ROCHELLE NY 5
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 O'Malley BALTIMORE MD 523
## 2 O'Malley WASHINGTON DC 234
## 3 O'Malley NEW YORK NY 181
## 4 O'Malley SILVER SPRING MD 135
## 5 O'Malley BETHESDA MD 119
## 6 O'Malley POTOMAC MD 94
## 7 O'Malley ANNAPOLIS MD 77
## 8 O'Malley ROCKVILLE MD 76
## 9 O'Malley CHICAGO IL 70
## 10 O'Malley SAN FRANCISCO CA 69
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Graham GREENVILLE SC 296
## 2 Graham COLUMBIA SC 259
## 3 Graham NEW YORK NY 204
## 4 Graham CHARLESTON SC 125
## 5 Graham SPARTANBURG SC 83
## 6 Graham ALEXANDRIA VA 62
## 7 Graham CHAPIN SC 62
## 8 Graham MYRTLE BEACH SC 57
## 9 Graham HILTON HEAD ISLAND SC 56
## 10 Graham LOS ANGELES CA 56
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Bush NEW YORK NY 1397
## 2 Bush HOUSTON TX 907
## 3 Bush WASHINGTON DC 755
## 4 Bush MIAMI FL 715
## 5 Bush DALLAS TX 572
## 6 Bush TALLAHASSEE FL 408
## 7 Bush CORAL GABLES FL 392
## 8 Bush ALEXANDRIA VA 342
## 9 Bush TAMPA FL 305
## 10 Bush ATLANTA GA 283
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Trump HOUSTON TX 7618
## 2 Trump DALLAS TX 4098
## 3 Trump LAS VEGAS NV 4061
## 4 Trump SAN ANTONIO TX 3991
## 5 Trump NEW YORK NY 3476
## 6 Trump NAPLES FL 2909
## 7 Trump SAN DIEGO CA 2885
## 8 Trump AUSTIN TX 2817
## 9 Trump PHOENIX AZ 2768
## 10 Trump SCOTTSDALE AZ 2730
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Jindal BATON ROUGE LA 81
## 2 Jindal NEW ORLEANS LA 49
## 3 Jindal METAIRIE LA 27
## 4 Jindal HOUSTON TX 20
## 5 Jindal LAFAYETTE LA 18
## 6 Jindal KENNER LA 16
## 7 Jindal LAKE CHARLES LA 16
## 8 Jindal MANDEVILLE LA 15
## 9 Jindal SHREVEPORT LA 14
## 10 Jindal HOUMA LA 13
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Christie NEW YORK NY 216
## 2 Christie MENDHAM NJ 122
## 3 Christie MORRISTOWN NJ 113
## 4 Christie DALLAS TX 91
## 5 Christie SUMMIT NJ 70
## 6 Christie BASKING RIDGE NJ 66
## 7 Christie WESTFIELD NJ 58
## 8 Christie LIVINGSTON NJ 53
## 9 Christie MCLEAN VA 50
## 10 Christie PRINCETON NJ 47
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Walker CARMEL IN 117
## 2 Walker NEW YORK NY 98
## 3 Walker HOUSTON TX 93
## 4 Walker MILWAUKEE WI 89
## 5 Walker DALLAS TX 81
## 6 Walker BROOKFIELD WI 77
## 7 Walker MADISON WI 66
## 8 Walker ATLANTA GA 65
## 9 Walker CHICAGO IL 65
## 10 Walker NAPLES FL 63
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Stein SAN FRANCISCO CA 181
## 2 Stein PORTLAND OR 137
## 3 Stein NEW YORK NY 130
## 4 Stein SEATTLE WA 124
## 5 Stein CHICAGO IL 112
## 6 Stein BROOKLYN NY 95
## 7 Stein LOS ANGELES CA 92
## 8 Stein SAN DIEGO CA 60
## 9 Stein EUGENE OR 59
## 10 Stein TULSA OK 58
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Webb ALEXANDRIA VA 36
## 2 Webb ARLINGTON VA 31
## 3 Webb CHESTERFIELD VA 22
## 4 Webb WASHINGTON DC 22
## 5 Webb NEW YORK NY 17
## 6 Webb HOUSTON TX 11
## 7 Webb TILGHMAN MD 10
## 8 Webb LOS ANGELES CA 9
## 9 Webb RICHMOND VA 9
## 10 Webb SAN FRANCISCO CA 9
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Kasich COLUMBUS OH 646
## 2 Kasich NEW YORK NY 561
## 3 Kasich CINCINNATI OH 540
## 4 Kasich WASHINGTON DC 342
## 5 Kasich HOUSTON TX 263
## 6 Kasich CHICAGO IL 249
## 7 Kasich ALEXANDRIA VA 245
## 8 Kasich ATLANTA GA 226
## 9 Kasich CLEVELAND OH 203
## 10 Kasich DUBLIN OH 185
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Gilmore RICHMOND VA 14
## 2 Gilmore ROANOKE VA 14
## 3 Gilmore ALEXANDRIA VA 4
## 4 Gilmore MCLEAN VA 4
## 5 Gilmore NEW YORK NY 3
## 6 Gilmore VA BEACH VA 3
## 7 Gilmore AUSTIN TX 2
## 8 Gilmore FORT WORTH TX 2
## 9 Gilmore FREDERICKSBURG VA 2
## 10 Gilmore HENRICO VA 2
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Lessig SAN FRANCISCO CA 87
## 2 Lessig NEW YORK NY 57
## 3 Lessig CAMBRIDGE MA 45
## 4 Lessig SAN ANSELMO CA 45
## 5 Lessig SEATTLE WA 34
## 6 Lessig CHICAGO IL 27
## 7 Lessig BROOKLYN NY 24
## 8 Lessig SAN JOSE CA 20
## 9 Lessig WASHINGTON DC 19
## 10 Lessig LOS ANGELES CA 17
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 Johnson NEW YORK NY 273
## 2 Johnson HOUSTON TX 176
## 3 Johnson ALBUQUERQUE NM 167
## 4 Johnson ROCHESTER NY 136
## 5 Johnson AUSTIN TX 133
## 6 Johnson CHICAGO IL 121
## 7 Johnson DALLAS TX 118
## 8 Johnson SAN DIEGO CA 111
## 9 Johnson SEATTLE WA 85
## 10 Johnson LOS ANGELES CA 83
## # A tibble: 10 x 4
## names contbr_city contbr_st n
## <fctr> <fctr> <fctr> <int>
## 1 McMullin ALLEN TX 30
## 2 McMullin SALT LAKE CITY UT 29
## 3 McMullin HUNTINGTON BEACH CA 22
## 4 McMullin GREENSBORO NC 18
## 5 McMullin ALEXANDRIA VA 17
## 6 McMullin OREM UT 16
## 7 McMullin PROVO UT 16
## 8 McMullin TALLAHASSEE FL 16
## 9 McMullin WASHINGTON DC 16
## 10 McMullin ARLINGTON VA 14
I feel this is an extremely informative presentation of the data, as one can determine the top cities where each candidate received funds from. This information is highly valuable.Looking at just Clinton, she got the most from New York, D.C., and Los Angeles, Trump from HOuston, Dallas and Nevada (southern states) and Sanders from New York, Seattle, and Los Angeles.
for (i in namelist){a<-get(i)
a$location<-paste(as.character(a$city),as.character(a$contbr_city))
a$lat<- sapply(a$location, #Getting the longitude and latitude for each
function(x) #candidate.
if (!is.na(x)){
geocode(x,source ="dsk",messaging = FALSE)$lat}
else {NA})
a$lon<- sapply(a$location,
function(x)
if (!is.na(x)){
geocode(x,source ="dsk",messaging = FALSE)$lon}
else {NA})
assign(i,a) #Assigning the new information to the candidate dataframes.
remove(a)
}
for (i in namelist){
map("world", fill=TRUE, col="white", bg="lightblue", ylim=c(-60, 90), mar=c(0,0,0,0))
points(x=get(i)$lon,y=get(i)$lat,col="red")} #Plotting each candidate dataframe by city.
This visualization of contributions by city looks fine, but I believe this can be done in a somewhat more aesthetically pleasing way.
for (i in namelist){
print(ggplot()+borders("world",colour="gray50",fill="gray50")+
geom_point(data=get(i),mapping=aes(x=lon,y=lat),col='blue') #using borders and ggplot to plot the data.
)
}
This looks better but at this point, the data can definitely be fine-tuned using other variables such as the sum of contributions.
for (i in namelist){
a<-get(i)
a$sum_of_cont<-(subset(plotCont,names==i)%>%
group_by(names,contbr_city,contbr_st) %>% #Again grouping by city and state, but to get sum of contributions
summarize(sum=sum(contb_receipt_amt), #by city this time.
n=n()) %>%
ungroup() %>%
ungroup())$sum
assign(i,a)
rm(a)
}
for (i in namelist){
print(ggplot(aes(x=lon,y=lat),data=get(i))+borders("world",colour="gray50",fill="gray50")+
geom_point(alpha=.75,aes(col=get(i)$n,size=get(i)$sum_of_cont/1000))+
scale_color_gradient2(low = "red", mid = "white", high = "blue")) #Redoing the maps with size and color affected
} #by sum and count.
These plots definitely look a lot better and contain a lot more information!
The summary for each candidate by mean, median, and sum of contributions, faceted by whether it was the primary or general elections was definitely a useful plot– reproduced below:
p1<-ggplot(aes(x=names,y=contb_receipt_amt,group=1),
data=plotCont)+
geom_bar(aes(fill='red'),stat="summary",fun.y=mean)+ #bar plot for mean of contbributions by candidate
facet_wrap(~election_tp,nrow = 2)+ #split into the primary and general elections.
geom_point(alpha=.5,size=.75,stat="summary",fun.y=median)+
geom_line(color='purple',stat="summary",fun.y=median,size=1.25)+ #line plot for median contributions.
labs(x="Candidate Names",y="Mean Contribution Amount ($)")+
theme(axis.text.x = element_text(size=10),
axis.text.y = element_text(size=15),
axis.title = element_text(size=15),
strip.text = element_text(size=15))+
guides(fill=FALSE)
p2<-ggplot(aes(x=names,y=contb_receipt_amt/1000000,group=1),
data=plotCont)+
ylab("Contribution Total ($ mil)")+ #line plot for sum of contributions by candidate split by primary and
ggtitle("Contribution by Candidate")+ #general elections.
geom_line(size=1.25,color='steelblue',stat="summary",fun.y=sum)+
theme(plot.title = element_text(hjust=.5,size=22),axis.title.x=element_blank(),
axis.text.x = element_text(size=10),
axis.text.y = element_text(size=15),
axis.title=element_text(size=15),
strip.text = element_text(size=15))+
facet_wrap(~election_tp,nrow=2)+
guides(fill=FALSE)
grid.arrange(p2,p1)
The next plot I chose, as it is the next level of abstraction, shows the differences across parties.
ggplot(aes(x=party,y=contb_receipt_amt),data=plotCont)+
geom_boxplot(aes(alpha=.1))+
xlab("Party")+ylab("Contribution Amount")+ggtitle("Contributions by Party")+
theme(plot.title = element_text(hjust=.5,size=22),
axis.title = element_text(size=15),
axis.text = element_text(size=13)
)+
coord_cartesian(ylim=c(quantile(Contributions[Contributions$party=="R",]$contb_receipt_amt,probs=.25),
quantile(Contributions[Contributions$party=="R",]$contb_receipt_amt,probs=.95)))+
guides(fill=FALSE) #plotting boxplots for both contributions by gender and party with the 25th and 95th percentiles
#used to give a better picture of the plots.
Finally, the map visualization, for the main candidates is a worthwhile display of the data and the differences across these candidates:
#Redoing the maps with size and color affected by sum and count.
for (i in c("Clinton","Sanders","Trump")){
print(ggplot(aes(x=lon,y=lat),data=get(i))+borders("world",colour="gray50",fill="gray50")+
geom_point(aes(col=get(i)$n,size=get(i)$sum_of_cont/1000))+
scale_color_gradient2(low="red",mid="orange",high="yellow")+
ggtitle(paste(i," Contributions"))+theme(axis.title = element_text(size=15),
axis.text = element_text(size=15),
plot.title = element_text(hjust=.5,size=22),
legend.title = element_text(size=12)
)+labs(color="Count",size="Sum of Contributions($Th.)")
)
}
All three candidates seem to have similar locations contributing to them. Sanders garners donors from more locations than Clinton, and Trump garners even more locations than either!! But, both Sanders and Clinton seem to have areas with more-so higher contributions, though obviously less populated.
Results:
According to the visualizations, even though Sander’s campaign was deemed very popular, and was self-financing without the helps of PACs, other candidates still did much better than him in terms of donations! Furthermore, even lesser known, and barely in the race candidates, like Lessig, received high mean and median contributions– higer than Trump, Sanders or Clinton which suggests that these candidates (though probably did get just as high donations as the other candidates) received a majority of low amount donations. Finally, it seems the three main candidates got donations from across the globe, yet, again, it seems Trump, due to his world-wide fame garnered even more (count-wise), something not many people expected at all.
For improvements and further research, I would suggest attemping to do this for local elections, and then determining whether contributions actually affect winning rates (for Which the data here suggests they do not).